Annotating Large Email Datasets for Named Entity Recognition with Mechanical Turk

نویسندگان

Nolan Lawson

Kevin Eustice

Mike Perkowitz

Meliha Yetisgen-Yildiz

چکیده

Amazon's Mechanical Turk service has been successfully applied to many natural language processing tasks. However, the task of named entity recognition presents unique challenges. In a large annotation task involving over 20,000 emails, we demonstrate that a compet itive bonus system and interannotator agree ment can be used to improve the quality of named entity annotations from Mechanical Turk. We also build several statistical named entity recognition models trained with these annotations, which compare favorably to sim ilar models trained on expert annotations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PAYMA: A Tagged Corpus of Persian Named Entities

The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...

متن کامل

Annotating Named Entities in Twitter Data with Crowdsourcing

We describe our experience using both Amazon Mechanical Turk (MTurk) and CrowdFlower to collect simple named entity annotations for Twitter status updates. Unlike most genres that have traditionally been the focus of named entity experiments, Twitter is far more informal and abbreviated. The collected annotations and annotation techniques will provide a first step towards the full study of name...

متن کامل

Preliminary Experience with Amazon’s Mechanical Turk for Annotating Medical Named Entities

Amazon’s Mechanical Turk (MTurk) service is becoming increasingly popular in Natural Language Processing (NLP) research. In this paper, we report our findings in using MTurk to annotate medical text extracted from clinical trial descriptions with three entity types: medical condition, medication, and laboratory test. We compared MTurk annotations with a gold standard manually created by a domai...

متن کامل

Preliminary Experiments with Amazon's Mechanical Turk for Annotating Medical Named Entities

متن کامل

Robust Logistic Regression using Shift Parameters

Annotation errors can significantly hurt classifier performance, yet datasets are only growing noisier with the increased use of Amazon Mechanical Turk and techniques like distant supervision that automatically generate labels. In this paper, we present a robust extension of logistic regression that incorporates the possibility of mislabelling directly into the objective. This model can be trai...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Annotating Large Email Datasets for Named Entity Recognition with Mechanical Turk

نویسندگان

چکیده

منابع مشابه

PAYMA: A Tagged Corpus of Persian Named Entities

Annotating Named Entities in Twitter Data with Crowdsourcing

Preliminary Experience with Amazon’s Mechanical Turk for Annotating Medical Named Entities

Preliminary Experiments with Amazon's Mechanical Turk for Annotating Medical Named Entities

Robust Logistic Regression using Shift Parameters

عنوان ژورنال:

اشتراک گذاری